Far-Field ASR Without Parallel Data

نویسندگان

  • Vijayaditya Peddinti
  • Vimal Manohar
  • Yiming Wang
  • Daniel Povey
  • Sanjeev Khudanpur
چکیده

In far-field speech recognition systems, training acoustic models with alignments generated from parallel close-talk microphone data provides significant improvements. However it is not practical to assume the availability of large corpora of parallel close-talk microphone data, for training. In this paper we explore methods to reduce the performance gap between far-field ASR systems trained with alignments from distant microphone data and those trained with alignments from parallel close-talk microphone data. These methods include the use of a lattice-free sequence objective function which tolerates minor mis-alignment errors; and the use of data selection techniques to discard badly aligned data. We present results on single distant microphone and multiple distant microphone scenarios of the AMI LVCSR task. We identify prominent causes of alignment errors in AMI data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stream Attention for far-field multi-microphone ASR

A stream attention framework has been applied to the posterior probabilities of the deep neural network (DNN) to improve the far-field automatic speech recognition (ASR) performance in the multi-microphone configuration. The stream attention scheme has been realized through an attention vector, which is derived by predicting the ASR performance from the phoneme posterior distribution of individ...

متن کامل

Front-end for far-field speech recognition based on frequency domain linear prediction

Automatic Speech Recognition (ASR) systems usually fail when they encounter speech from far-field microphone in reverberant environments. This is due to the application of short-term feature extraction techniques which do not compensate for the artifacts introduced by long room impulse responses. In this paper, we propose a front-end, based on Frequency Domain Linear Prediction (FDLP), that tri...

متن کامل

Noise-matched training of CRF based sentence end detection models

Sentence end detection (SED) is an important task for many applications and has been studied on written text and automatic speech recognition (ASR) transcripts. In previous work it was shown that conditional random fields models gave best SED performance on a range of tasks, with and without the inclusion of prosodic features. So far, true transcripts were used for both training and evaluation ...

متن کامل

Speech Representation Learning Using Unsupervised Data-Driven Modulation Filtering for Robust ASR

The performance of an automatic speech recognition (ASR) system degrades severely in noisy and reverberant environments in part due to the lack of robustness in the underlying representations used in the ASR system. On the other hand, the auditory processing studies have shown the importance of modulation filtered spectrogram representations in robust human speech recognition. Inspired by these...

متن کامل

Hilbert Envelope Based Features for Far-Field Speech Recognition

Automatic speech recognition (ASR) systems, trained on speech signals from close-talking microphones, generally fail in recognizing far-field speech. In this paper, we present a Hilbert Envelope based feature extraction technique to alleviate the artifacts introduced by room reverberations. The proposed technique is based on modeling temporal envelopes of the speech signal in narrow sub-bands u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016